Towards a GDPR-compliant cloud architecture with data privacy controlled through sticky policies

Data privacy is one of the biggest challenges facing system architects at the system design stage. Especially when certain laws, such as the General Data Protection Regulation (GDPR), have to be complied with by cloud environments. In this article, we want to help cloud providers comply with the GDPR by proposing a GDPR-compliant cloud architecture. To do this, we use model-driven engineering techniques to design cloud architecture and analyze cloud interactions. In particular, we develop a complete framework, called MDCT, which includes a Unified Modeling Language profile that allows us to define specific cloud scenarios and profile validation to ensure that certain required properties are met. The validation process is implemented through the Object Constraint Language (OCL) rules, which allow us to describe the constraints in these models. To comply with many GDPR articles, the proposed cloud architecture considers data privacy and data tracking, enabling safe and secure data management and tracking in the context of the cloud. For this purpose, sticky policies associated with the data are incorporated to define permission for third parties to access the data and track instances of data access. As a result, a cloud architecture designed with MDCT contains a set of OCL rules to validate it as a GDPR-compliant cloud architecture. Our tool models key GDPR points such as user consent/withdrawal, the purpose of access, and data transparency and auditing, and considers data privacy and data tracking with the help of sticky policies.


A COMPLETE ARCHITECTURAL MODEL
.1 shows, the DataCenterElement data type is included to represent a set of data centers with the same configuration.Likewise, the RackElement for racks.The profile definition includes the attributes necessary for the component stereotypes to simulate different system component specifications, such as the number of cores in a CPU or machines per board in a rack (machinesPerBoard).As can be seen, each DataCenter is composed of a set of RackElements, which contains a set of racks.Each rack component is defined by specifying the machines per board, the network, and the boards (see Rack component).The rack can be dedicated to computing or storage, so two types of racks are defined, namely ComputingRack and StorageRack, which contain stateless computation machines (StalessComputationMachine stereotype) or stateless storage machines (SSMProcessor stereotype), respectively.Each machine is defined in terms of CPU (CPU stereotype), memory (Memory), and storage (Storage).As can be seen in the bottom right of Fig. A.1, the data is associated with the Storage stereotype which is an attribute of the machines where it will be stored.Then, it is associated with storage and computation machines.Gigabytes, or Terabytes (right part).Latency requires a name of type string and an attribute of type Time.
Finally, the remaining attributes consist of primitive data types, mainly integer and string, except for the cloudProvider attribute of the Infrastructure stereotype of type ControllerCP defined for the interaction.All these must be parameterized when defining the model.SSMProcessor and StatelessAppCTP, which are regular binary relationships, all other associations model the ownership of the (opposite) end of the association.This association means that the stereotype connected by the dotted arrow will become an attribute of the stereotype associated with it (the former is owned by the latter).Therefore, most attributes are specified by another stereotype or user-defined data types, as illustrated by the StickyPolicy stereotype.This stereotype is made up of the following attributes: permission, owners, purpose, controller, and accesshistory.The permission attribute is required for defining restrictions (permissions) on data usage.This attribute is of the PermissionPerTP data type, which is used to define who is authorized to grant permissions for data access (S), and who has obtained permission for writing the data (I), both being defined as a list of lists of tps or Users.For this purpose, the Principal stereotype, which can be a User or a tp, is defined (see Section 5.2).Then, to create the list of lists, it is necessary to create a data type that establishes the first list of principals, i.e., PList.Thus, we can later define, in S and I, a list of this type to achieve it.The attribute owners, of PList type, establishes the user (or users in the case of combined data sets), which are data owners of the data which pairs with this policy.
Then, the controller attribute, of type ControllerCP, indicates the data controller of the data.Note that no ad-hoc identification is required as data processors usually use segmentation techniques to separate data from different data subjects.The purpose attribute has been extracted from point 1c of Article 13 GDPR and contains the required information, detailing the purposes for which the controller of the data allows the treatment of its data.Finally, the accessHistory attribute10 of the AccessPerTP data type is defined to specify all the third parties that access the data, thus allowing us to track the data and obtain information about who obtained permission for that access.The controller and owners attributes, of ControllerCP and User types, respectively, indicate the data controller and the user (or users in case of combined data sets) which are data owners.
The AccessPerTP stereotype is used in the SP in the accessHistory field to track data accesses and purpose.It has three atributes: tp, actionPerformed, and purpose.Note that the purpose attribute of the StickyPolicy stereotype must match its contents to model that a third party does not access the data for a purpose other than the one stated by the controller.
Another important stereotype is the AccessLog stereotype, which represents the log used by the controller to control where data is stored and to track data accesses.A new entry will be included in the log for each access to the data to capture this.This log has the following attributes: tp, l1 (l for location), sp, O (for Owners), action, newl, and newsp.The tp attribute, of StatelessAppCTP type (where AppCTP stands for computing application developed by a third party), relates a data access to a third party and allows us to know who is responsible for the data access.The l1 attibute is of Storage type and represents the current location of the data being accessed.This attribute allows for more complete data tracking as it links a data access to a machine.The sp attribute, of StickyPolicy type, records the initial sticky policy for the data treated to detect possible alterations between the input and output data sets.The O attribute of type list of Principals (PList) indicates who consents to the data access.The action attribute is of ActionType type and records the operation performed on the data, which can be a read or a write.
The newl attribute, of Storage type, specifies the location where the data has been stored after the action performed on it.Finally, the last property, namely newsp, of type StickyPolicy, contains the resulting policy on the data after the action.The value of this attribute when data are combined over two sets of data is shown in Section 5.2.
The SLA stereotype has five attributes that are modeled on the basis of Article 28 GDPR.This stereotype represents the contract that governs data processing, which the controller and processor are required to sign, in accordance with point 3 of the above article.The attributes of this stereotype are subjectMatter, processingDuration, recipients, processingNature, processingPurpose, and processingInstructions.The first two attributes, defined as an array of strings and Time stereotype, respectively, set the theme and duration of the processing.The recipients attribute is defined as a list of StatelessAppCTP and represents the list of third parties who are allowed to access the data so far.The nature of the treatment and the purpose are the following two attributes, where the latter must match the one indicated in the SP defined by the user and are defined as string arrays.Finally, the attribute processingInstructions models the set of directions given by the controller to regulate data processing.
The ControllerCP stereotype includes two attributes: resourceAllocationPolicy and idProvider.The first models the type of policy that the controller uses to allocate its resources.The second attribute, defined as a string type, models the information about the controller it must include in each contract as spContact, which is the cloud service provider.The remaining attributes result from the use of end classifiers in the associations of this stereotype.As stated above, these are represented by an arrow with a dot at one end of an association and indicate that the marked stereotype will be an attribute of the 37/42 stereotype at the other end.It is also worth noting that the multiplicity of the end with the dot becomes that of the resulting attribute.Thus, having a multiplicity of one-or-many in the marked stereotype implies that the resulting attribute represents a set of elements of that type.Therefore, ControllerCP receives two attributes named accessLog and sla of AccessLog and SLA types, respectively.
In contrast, the few primitive type attributes in this diagram are mostly strings, as represented by the ControllerCP or SLA stereotypes.
The Data stereotype represents the data that belongs to a certain user or set of users (only in the case of combined data).For this stereotype, it is necessary to include two specific data types, namely DataArchive and DataField.DataArchive models the structure of a data file, being composed of an identifier, idData, and its contents, contents.The content of an archive consists of a group of fields (DataField type), and each one, in turn, contains a value, which is an attribute of string type.In addition, the Data stereotype includes the sticky policy that is applied to it (appliedPolicy attribute).The Storage attribute, in turn, is an attribute of Machine, which is abstract, so it will be inherited by the SSMProcessor and StatelessComputationMachine stereotypes.The processors represent the machines that store and maintain the data at all times, although the computing machines will only occasionally store data (provided by a SSMProcessor) when processing it via the StatelessAppCTP that requested such data.Validates that the set of data to rectify with the contents on the message newData is located in all of the machines which the message is destined to.This is achieved by verifying that, for all the machines in the list of the newData message (self.machines), the data included in the message (self.data) is included in every list of data inside the machine (m.This rule checks that the processor contained in accesslog from which data has been obtained for the operation is under SLA with the controller of said data.To do this it accesses the list of accesslogs of the controller (self.accesslog)and checks, for all of them, that it exists at least one SLA in the controller list which is included in the SLA list of the location1 machine of the log (log.location1.sla)Specification This rule validates that the machine containing the source copy of data is under SLA with the controller.First, it gets the list of SLAs for the controller included inside the sticky policy of the log of the controller (self.accesslog.sp.controller.sla),then it checks that it exists (exists operation) at least one sla in said list which is included (includes operation) in the list of SLAs in the source machine contained in the same sticky policy of the log (self.In this rule the list of third parties who accessed the data is first accessed, this is done through the sticky policy attribute (sp) of the controller's accesslog (self.accesslog.sp.accessHistory).Then, it is check for all them (forAll operation) that for all the users (second forAll operation) in the list of owners (self.accesslog.sp.owners) the list of recipients of their user contract (ow.bindingContract.recipients)includes the third party in the accessHistory attribute (his.tp).In this way it is ensured that data is not accessed by any tp that the users have not been informed of.Note that this could have been done with StickyPolicy as starting point, but with the additional navigation the error is thrown by the controller which is the entity that would manage this situation in a real scenario.This rule ensures that all of the machines included as destinations of a subscribe message are marked as compliant with the GDPR.Specification self.machines-->forAll(m| m.GDPRCompliance=true)

Figure A. 2
Figure A.2 shows the attributes and the relationships between the interaction stereotypes as associations of stereotypes.Other than the relationships between User and Data, ControllerCP and Data, and
data) Specification self.machines-->forAll(m| m.data-->includes(self.data))Name machine contains data to erase Severity ERROR Context eraseData Description Similarly to the previous rule, this one checks that the set of data to erase on the message eraseData is located in all of the destination machines of the message.Specification self.machines-->forAll(m| m.data-->includes(self.data))Name machine contains data to subscribe to Severity ERROR Context subscribe DescriptionAlike the former two rules, this one checks that the set of data which the controller wants to subscribe to is present in all of the destination machines of the message.Specification self.machines-->forAll(m| m.data-->includes(self.data)) self.accesslog--> forAll(log | self.sla--> exists(sla | log.location1.sla-->includes(sla))) accesslog.sp.sourceMachine.sla)Specification meant to ensure that the data introduced in the newData messages does not infringe the data accuracy RGPD principle by introducing empty fields.To do this, it is checked that for all the fields in the newData attribute of the message (self.newData), the size (number of characters of the string) is greater than 0 Specification self.newData-->forAll(f| f.value.size()>0)units of time for the maximum storage time of data are smaller than usual.The way this is check is the exact same as in the previous rule Specification self.maxTime.unit=TimeUnit::hor self.maxTime.unit=TimeUnit::minName newData destinatnion machines comply with GDPR Severity ERROR Context newData Description This rule ensures that all of the machines included as destinations of a newData message are marked as compliant with the GDPR, just like rule 10 does for upDate.